Method and apparatus for reducing an interference noise signal fraction in a microphone signal

ABSTRACT

The invention discloses a method of reducing an interference noise signal fraction in a microphone signal, which method is based on estimating the interference noise signal fraction from a virtually pure interference noise signal and does not require any additional microphones. It is an essential feature of the method according to the invention that the signal which is used as a basis for estimating the interference noise signal fraction in the microphone signal of interest is received by means of one or more inversely operated loudspeakers. There is no need to install further microphones, particularly in situations where there are already one or more loudspeakers as components of an audio system. Such a situation arises for example in any motor vehicle fitted with an audio system.

The invention relates to a method of reducing an interference noisesignal fraction in a microphone signal. The invention furthermorerelates to an apparatus for reducing an interference noise signalfraction in a microphone signal.

Such methods are highly important in particular for improving thequality of speech signals which are fed to a speech recognition deviceor to a telecommunications device. One important application examplefrom the telecommunications sector is hands-free devices, which nowadaysby law must be used for making telephone calls in motor vehicles. Withthe aid of such hands-free devices, it is possible for the driver tocommunicate with a remote conversation partner without having to takehis hands off the steering wheel and hence without taking his eyes offthe road.

The example of hands-free devices can be used to clearly illustrate thetwo types of interference noise which are mainly distinguished and theelimination of which from the speech signal transmitted to the remoteconversation partner forms the object of the method under consideration.

Firstly there is the interference noise that comes from one or moreknown sources of sound. In the case of hands-free devices in cars, thisis for example the noise produced by the loudspeaker of the hands-freedevice or by the loudspeakers of an audio system. If, for example, thespeech signal of the remote conversation partner that is produced by theloudspeaker of the hands-free device reaches the microphone and is notremoved from the microphone signal, then the remote conversation partnerwill hear an echo of his own voice, and this is perceived as highlyunpleasant. The methods used to remove such interference noise fractionsfrom the microphone signal require knowledge of the signal whichproduces the interference noise. In the example described above, this isthe speech signal of the remote conversation partner which is fed to theloudspeaker of the hands-free device. Such methods are described forexample in EP 0 948 237 A2 and in DE 41 06 405 A1.

The second type of interference noise includes that noise about theproduction of which one is not precisely aware and which is generallyproduced by a large number of sources of noise which are not preciselydefined. Typical surrounding noise belongs to this type of interferencenoise. If the example of a hands-free device in a motor vehicle is againconsidered, the noise of the car being driven belongs to this type ofinterference noise. A large group of methods for reducing interferencenoise of this type are based on estimating the interference noisefraction on the basis of the microphone signal. The interference noisesignal fraction in the microphone signal is reduced with the aid of thisestimate, for example using the method of spectral subtraction. Onemethod from this group is described for example in U.S. Pat. No.6,363,345 B1. However, estimating the interference noise fraction fromthe microphone signal poses the problem that within the microphonesignal those sections of noise in which there is only an interferencenoise signal fraction and no useful signal fraction must be detected. Inthe case of a hands-free device in a motor vehicle, signal sections suchas this which contain no speech signal fraction would be in themicrophone signal. As long as such signal sections are present, anadditional signal processing step, so-called voice activity detection(VAD), is necessary to detect these signal sections. However, VAD oftensupplies only unreliable results, particularly in the case of a poorsignal-to-noise ratio (SNR) in the microphone signal. Moreover, theassumption must be made that the interference noise signal estimate madein the speech-signal-free section is also valid at later points in time.However, this assumption represents only an inadequate approximation,particularly in the case of interference noise which changes rapidlyover time combined with long speech signal sections.

It is therefore an object of the present invention to specify a methodfor reducing an interference noise signal fraction in a microphonesignal, which method allows a good estimate of the interference noisesignal fraction and hence a good reduction in the interference noisesignal fraction in the microphone signal, with a low signal processingoutlay.

The above-mentioned object is achieved according to the invention by amethod comprising the steps as claimed in claim 1. The dependent claimscontain advantageous refinements and developments of the method asclaimed in claim 1.

According to the method of the invention, the interference noisereference signal or interference noise reference signals used as a basisfor estimating the interference noise signal fraction in the microphonesignal of interest are determined by means of in each case one inverselyoperated loudspeaker, that is to say a loudspeaker operated as amicrophone.

The loudspeaker is suitably positioned such that the signal fractioncoming from the interference noise source in the associated interferencenoise reference signal is at least as high as the signal fraction comingfrom the speech signal source. If the unit SNR customary in signalprocessing is used and if the signal fraction coming from the speechsignal source is identified within this context as the signal and thesignal fraction coming from the interference noise source is identifiedas noise, then this corresponds to an SNR of less than or equal to zero.The signal fraction coming from the interference noise source in theassociated interference noise reference signal is preferably even twiceas high as the signal fraction coming from the speech signal source, andthis corresponds to an SNR of around −6. By positioning the loudspeakerin this way, the information about the interference noise signalfraction which can be obtained from the loudspeaker signals is onlyfalsified to a slight extent by speech signal fractions. In the methodaccording to the invention there is no need to install additionalmicrophones, particularly in situations where there are already one ormore loudspeakers as components of an audio system.

The estimate of the interference noise signal fraction from theloudspeaker signals, which are also referred to as interference noisereference signals, is determined as a function of whether there is justone or a number of such signals, in one or two steps. If there is justone available interference noise reference signal, a method of signalestimation theory, for example a recursive noise estimate, is applied tothis signal and hence the estimate of the interference noise signalfraction is determined directly. In the case of more than oneinterference noise reference signal, in the first step a method ofsignal estimation theory, for example the recursive noise estimate, isapplied to each of these signals and hence in each case a provisionalestimate of the interference noise signal fraction is determined. In thesecond step, these provisional estimates of the interference noisesignal fraction are then combined by linear superposition, as a resultof which the desired estimate of the interference noise signal fractionis finally obtained. The linear superposition is preferably carried outsuch that firstly the provisional estimates of the interference noisesignal fraction are multiplied by in each case one weighting factor andthen the weighted provisional estimates of the interference noise signalfraction that are thus obtained are summed. The weighting factorsreflect the transmission channel characteristic of the correspondingloudspeaker signal. In qualitative terms it can be said that the furtheraway the loudspeaker is positioned from the speech signal source, thegreater the attenuation of the speech signal in this loudspeaker andconsequently the greater the associated weighting factor.

Once the estimate of the interference noise signal fraction has beendetermined, this is deducted from the microphone signal, for exampleusing optimal filtering, as a result of which the clean microphonesignal, that is to say the microphone signal reduced by the interferencenoise signal fraction, is finally obtained. In the method of optimalfiltering, the frequency response of a filter, known as the optimalfilter or Wiener filter, is calculated on the basis of the estimate ofthe interference noise signal fraction and the microphone signal, andthe interference noise signal fraction is deducted from the microphonesignal by applying this filter to the microphone signal. This may takeplace both in the time domain and in the frequency domain. Furthermethods for deducting the interference noise signal fraction from themicrophone signal are, for example, spectral subtraction and non-linearspectral subtraction.

In another refinement of the method according to the invention, besidesthe interference noise reference signals received by the loudspeakersand the estimate of the interference noise signal fraction resultingtherefrom, which is referred to hereinbelow as the first estimate, themicrophone signal itself is also used to determine a second estimate ofthe interference noise signal fraction. In a further step, the first andsecond estimates are then combined by linear superposition, just likethe provisional estimates when there are a number of interference noisereference signals, and thus the desired estimate of the interferencenoise signal fraction is determined.

The most varied uses are conceivable for the clean microphone signalobtained using the method according to the invention. For instance, itmay be fed to a telecommunications device and thus be transmitted to aremote conversation partner, as a result of which the quality of thereceived speech signal is increased for said conversation partner. In afurther use, the clean microphone signal may be fed to a speechrecognition device, as a result of which the recognition capability ofthis system is increased.

In a further refinement of the method according to the invention, themicrophone signal and the at least one interference noise referencesignal are received in a means of transport, for example a motorvehicle, and the loudspeakers used form part of an already existingloudspeaker system. This is particularly advantageous especially in amotor vehicle, since the loudspeakers in that case are generallypositioned such that the interference noise signal fraction in thesignal received by it is at least as high as the speech signal fractioncoming from a speaker sitting in the driver's seat.

The invention furthermore relates to an apparatus for carrying out themethod as claimed in claim 1. The apparatus comprises a signal processoron which the determination of the estimate of the interference noisesignal fraction and the deduction of this estimate from the microphonesignal are carried out. The apparatus furthermore comprises at least onemicrophone which is coupled to the signal processor. This coupling maybe effected for example by means of a line or in a wireless manner, anda so-called codec for the analog/digital conversion of the microphonesignal is usually connected in between. The apparatus likewise comprisesat least one loudspeaker which is operated as a microphone and islikewise coupled to the signal processor. In this case, too, thecoupling may be effected for example by means of a line or in a wirelessmanner, and a codec for the analog/digital conversion of the loudspeakersignal may be connected in between. Besides the processing stepsbelonging to the method according to the invention, even more dataprocessing steps may also be carried out on the signal processor. Thesignal processor may in particular also form part of an already existingdata processing device and additionally be used for the method accordingto the invention.

The invention will be further described with reference to examples ofembodiments shown in the drawings to which, however, the invention isnot restricted.

FIG. 1 shows a block diagram to illustrate the method according to theinvention.

FIG. 2 shows a flowchart which illustrates the determination of aprovisional estimate of an interference noise signal fraction.

FIG. 3 shows a flowchart which illustrates the combining of theprovisional estimates of the interference noise signal fraction fordetermining an estimate of the interference noise signal fraction.

FIG. 4 shows a flowchart which illustrates the deduction of the estimateof the interference noise signal fraction from a microphone signal.

FIG. 1 shows a block diagram of an arrangement for carrying out themethod according to the invention. A microphone signal x, which is to befreed of an interference noise signal fraction using the methodaccording to the invention, is recorded using a microphone 101 and fedto a deduction unit 501 which deducts the estimate of the interferencenoise signal fraction from the microphone signal. Loudspeakers 201, 202and 203 are used as microphones in a known manner and are used to recordinterference noise reference signals x₁, x₂ and x₃. The selection, byway of example, of three loudspeakers and accordingly three interferencenoise reference signals is in no way obligatory. Rather, based on atleast one loudspeaker and accordingly one interference noise referencesignal, the number may be as desired and is limited at most by theresulting signal processing outlay. The three interference noisereference signals x₁, x₂ and x₃ are then respectively fed to anestimation unit 301, 302 and 303. In these estimation units, in eachcase a provisional estimate of the interference noise signal fraction isdetermined. These provisional estimates of the interference noise signalfraction, which are designated N₁, N₂ and N₃ in FIG. 1, are subsequentlyfed to a combination unit 401. This combination unit 401 combines theprovisional estimates of the interference noise signal fraction and thusdetermines an estimate of the interference noise signal fraction, whichis designated N in FIG. 1. This estimate of the interference noisesignal fraction is then fed, along with the microphone signal, to thededuction unit 501 as a second input signal. Within this deduction unit501, the estimate of the interference noise signal fraction is deductedfrom the microphone signal and thus a clean signal x′ is determined.

FIG. 2 shows a flowchart which illustrates the mode of operation of theestimation unit 301. Within this estimation unit 301, the provisionalestimate of the interference noise signal fraction N₁ is calculated fromthe signal x₁ received by means of the loudspeaker 201. The mode ofoperation of the estimation units 302 and 303 is thus identical.Firstly, the signal x₁ is digitized by means of an analog/digitalconversion 310 at a sampling rate of 8 kHz. Thereafter, a block of Mdigital sample values of the signal x₁ is formed by means of a so-calledframing 311. This block is composed of the last M-B sample values of theprevious block and of the last B current sample values of the signal x₁.The signal processing thus takes place in successive blocks comprising Msample values which overlap by M-B sample values, where in each case Bcurrent sample values are processed. If M=256 and B=128 are selected,then, at a sampling rate of 8 kHz, a block corresponds to a timeduration of 32 ms and the successive blocks overlap by 16 ms, that is tosay by 50%. In a subsequent windowing 312, the M sample values of theblock are multiplied by the functional values of a window function, forexample of a Hamming function, in order at the next transition into thefrequency domain to reduce to reduce disruptive influences on account ofthe framing. The “windowed” sample values determined in this way arethen transformed into the frequency domain by means of a discreteFourier transform 313. In a next processing step 314, the absolutesquare of the M complex Fourier coefficients is formed, giving the powerspectrum P₁(f,i). Here, f is the frequency and i is the index of thecurrent block which is related to the time via the block length and thesampling rate. This power spectrum is then smoothed by means of arecursive smoothing 315 according to the formulaN ₁(f,i)=α·N ₁(f,i−1)+(1−α)·P ₁(f,i)giving the provisional estimate of the interference noise signalfraction in the frequency domain N₁(f,i). The smoothing filtercoefficient α is a parameter of the method that has to be optimized. Atypical value for α is for example 0.99. At this point it should benoted that the determination of the provisional estimate of theinterference noise signal fraction does not necessarily have to takeplace in the frequency domain. Rather, implementations in the timedomain are also conceivable.

FIG. 3 shows a flowchart to illustrate the mode of operation of thecombination unit 401. The provisional estimates of the interferencenoise signal fraction N₁, N₂ and N₃, which have been determined in theestimation units 301, 302 and 303 in the manner described above, arefirstly multiplied in each case by a weighting factor β₁, β₂ and β₃.These weighting factors are again parameters of the method according tothe invention that need to be optimized, and they reflect thetransmission channel characteristic of the corresponding loudspeakersignal. In qualitative terms it can be said that the further away theloudspeaker is positioned from the speech signal source, the greater theattenuation of the speech signal in this loudspeaker and consequentlythe greater the associated weighting factor β. Once all the provisionalestimates of the interference noise signal fraction have been multipliedby their respective weighting factors, the estimate of the interferencenoise signal fraction N is given as the sum of these products:${N( {f,i} )} = {\sum\limits_{k}{\beta_{k} \cdot {N_{k}( {f,i} )}}}$It should be noted that in the case of just one loudspeaker andaccordingly just one interference noise reference signal, the processingstep within the estimation unit 401 is omitted and the provisionalestimate of the interference noise signal fraction N₁(f,i) is identicalto the estimate of the interference noise signal fraction N(f,i).

FIG. 4 uses a flowchart to illustrate the mode of operation of thededuction unit 501 in which the last step of the method according to theinvention, the deduction of the estimate of the interference noisesignal fraction from the microphone signal, is carried out. Firstly, themicrophone signal x, analogously to the loudspeaker signal x₁ in FIG. 2,is subjected to analog/digital conversion 510, framing 511, windowing512, transformation into the frequency domain 513 and calculation of thepower spectrum P(f,i) 514 as an absolute square of the complex Fouriercoefficients. Besides the power spectrum, in a processing step 515 thephase φ(f,i) of the complex Fourier coefficients X is then alsocalculated. A clean power spectrum P′(f,i) is then calculated from theestimate of the interference noise signal fraction N(f,i) determined inthe combination unit 401 and from the power spectrum of the microphonesignal P(f,i), by means of a non-linear spectral subtraction 516according to the formulaP′(f,i)=max{P(f,i)−a(f,i)·N(f, i), b·N(f,i)}Here, the so-called overestimation factor a(f,i) and the so-called floorfactor b are parameters of the method according to the invention thathave to be optimized. In respect of the method of non-linear spectralsubtraction, reference should be made to Bouquin, R. L., “Enhancement ofnoisy speech signals: Applications to mobile radio communications”,Speech Communication, Vol. 18, 1996. In the processing step 517, a cleanspectrum of complex Fourier coefficients X′(f,i) is then calculated fromthe clean power spectrum and the previously calculated unchanged phaseφ(f,i), according to the equationX′(f,i)=√{square root over (P′(f,i))}·e ^(i-φ(f,i))Finally, the clean microphone signal x′ is obtained from this cleanspectrum following an inverse Fourier transform 518 and a procedure 519that is the inverse of framing, according to the so-called overlap-addmethod. At this point it should again be noted that a subtraction methodin the frequency domain does not necessarily have to be selected, butrather methods in the time domain are also conceivable.

1. A method of reducing an interference noise signal fraction in amicrophone signal which contains the interference noise signal fractioncoming from at least one interference noise source and a speech signalfraction coming from a speech signal source, said method comprising thefollowing steps: reception of the microphone signal containing theinterference noise signal fraction and the speech signal fraction,reception of at least one interference noise reference signal by meansof in each case one inversely operated loudspeaker, where theloudspeaker or loudspeakers are positioned such that the signal fractioncoming from the interference noise sources in the respectiveinterference noise reference signal is at least as high as the signalfraction coming from the speech signal source in this interference noisereference signal, in the case of just one interference noise referencesignal, determination of an estimate of the interference noise signalfraction from the interference noise reference signal using a method ofsignal estimation theory, in the case of more than one interferencenoise reference signal, determination of in each case one provisionalestimate of the interference noise signal fraction from each of theinterference noise reference signals using a method of signal estimationtheory and subsequent determination of the estimate of the interferencenoise signal fraction in the microphone signal by combining theseprovisional estimates of the interference noise signal fraction,reduction of the interference noise signal fraction in the microphonesignal by deducting the estimate of the interference noise signalfraction from the microphone signal.
 2. A method as claimed in claim 1,characterized in that in an additional method step, besides thedetermination of a first estimate of the interference noise signalfraction by means of at least one interference noise reference signal, adetermination of a second estimate of the interference noise signalfraction is carried out by means of the microphone signal itself and athird estimate is determined from a linear combination of the first andsecond estimates of the interference noise signal fraction, and in thatthe reduction of the interference noise signal fraction in themicrophone signal is effected by deducting this estimate from themicrophone signal.
 3. A method as claimed in claim 1, characterized inthat in the case of more than one interference noise reference signalthe combination of the provisional estimates of the interference noisesignal fraction consists of the multiplication of any provisionalestimate of the interference noise signal fraction by in each case oneweighting factor and the subsequent summation of the weightedprovisional estimates of the interference noise signal fraction that arethus obtained.
 4. A method as claimed in claim 1, characterized in thatthe deduction of the estimate of the interference noise signal fractionfrom the microphone signal is carried out using optimal filtering.
 5. Amethod as claimed in claim 1, characterized in that the deduction of theestimate of the interference noise signal fraction from the microphonesignal is carried out using the method of spectral subtraction.
 6. Amethod as claimed in claim 1, characterized in that the microphonesignal reduced by the interference noise signal fraction is fed to aspeech recognition device.
 7. A method as claimed in claim 1,characterized in that the microphone signal reduced by the interferencenoise signal fraction is fed to a telecommunications device.
 8. A methodas claimed in claim 1, characterized in that the microphone signal andthe at least one interference noise reference signal are received in ameans of transport and the loudspeaker or loudspeakers used form part ofa loudspeaker system present in the means of transport.
 9. An apparatusfor carrying out the method as claimed in claim 1, which comprises atleast the following components: a signal processor for determining theestimate of the interference noise signal fraction and for deductingthis estimate from the microphone signal, at least one microphone whichis coupled to the signal processor and is provided as a receiver for themicrophone signal, at least one loudspeaker which is coupled to thesignal processor and is provided as a receiver for the interferencenoise reference signal.